Nearest Neighbor Machine Translation (kNNMT) is a simple and effective method of augmenting neural machine translation (NMT) with a token-level nearest neighbor retrieval mechanism. The effectiveness of kNNMT directly depends on the quality of retrieved neighbors. However, original kNNMT builds datastores based on representations from NMT models, which would result in poor retrieval accuracy when NMT models are not good enough, leading to sub-optimal translation performance. In this paper, we propose PRED, a framework that leverages Pre-trained models for Datastores in kNN-MT. Better representations from pre-trained models allow us to build datastores of better quality. We also design a novel contrastive alignment objective to mitigate the representation gap between the NMT model and pre-trained models, enabling the NMT model to retrieve from better datastores. We conduct extensive experiments on both bilingual and multilingual translation benchmarks, including WMT17 English $\leftrightarrow$ Chinese, WMT14 English $\leftrightarrow$ German, IWSLT14 German $\leftrightarrow$ English, and IWSLT14 multilingual datasets. Empirical results demonstrate the effectiveness of PRED.
translated by 谷歌翻译
Recent development of deep neural networks (DNNs) for tabular learning has largely benefited from the capability of DNNs for automatic feature interaction. However, the heterogeneity nature of tabular features makes such features relatively independent, and developing effective methods to promote tabular feature interaction still remains an open problem. In this paper, we propose a novel Graph Estimator, which automatically estimates the relations among tabular features and builds graphs by assigning edges between related features. Such relation graphs organize independent tabular features into a kind of graph data such that interaction of nodes (tabular features) can be conducted in an orderly fashion. Based on our proposed Graph Estimator, we present a bespoke Transformer network tailored for tabular learning, called T2G-Former, which processes tabular data by performing tabular feature interaction guided by the relation graphs. A specific Cross-level Readout collects salient features predicted by the layers in T2G-Former across different levels, and attains global semantics for final prediction. Comprehensive experiments show that our T2G-Former achieves superior performance among DNNs and is competitive with non-deep Gradient Boosted Decision Tree models.
translated by 谷歌翻译
联合学习(FL)已成为解决数据筒仓问题的实用解决方案,而不会损害用户隐私。它的一种变体垂直联合学习(VFL)最近引起了人们的关注,因为VFL与企业对利用更有价值的功能的需求相匹配,以构建更好的机器学习模型,同时保留用户隐私。当前在VFL中的工作集中于为特定VFL算法开发特定的保护或攻击机制。在这项工作中,我们提出了一个评估框架,该框架提出了隐私 - 私人评估问题。然后,我们将此框架作为指南,以全面评估针对三种广泛依据的VFL算法的大多数最先进的隐私攻击的广泛保护机制。这些评估可以帮助FL从业人员在特定要求下选择适当的保护机制。我们的评估结果表明:模型反转和大多数标签推理攻击可能会因现有保护机制而挫败;很难防止模型完成(MC)攻击,这需要更高级的MC靶向保护机制。根据我们的评估结果,我们为提高VFL系统的隐私保护能力提供具体建议。
translated by 谷歌翻译
联合学习(FL)使独立方能够在保护数据隐私的同时协作建立机器学习(ML)模型。 FL的变体垂直联合学习(VFL)最近引起了人们的注意,因为VFL与企业对利用更有价值的功能的需求相匹配,以实现更好的模型性能而不会损害数据隐私。但是,传统的VFL可能会陷入数据缺陷,因为它只能用标签来利用标签的对准​​样品(属于不同的各方),而通常将大多数未对齐和未标记的样品均未使用。数据缺乏阻碍了联邦的努力。在这项工作中,我们提出了一个联合的混合自我监督的学习框架,即Fedhssl,以利用参与者的所有可用数据(包括未对准和未标记的样本)来培训联合VFL模型。 FEDHSSL的核心思想是利用各方之间对齐的样本的跨党派观点(即分散特征)和各方的本地观点(即增强)来提高通过SSL(SSL)的表示能力(例如,simsiam)。 FEDHSSL进一步利用各方共享的通用特征,以通过部分模型聚合来提高联合模型的性能。我们从经验上证明,与基线方法相比,我们的FEDHSSL实现了显着的性能增长,尤其是当标记样品数量较小时。我们对FedHSSL提供有关隐私泄漏的深入分析,这在现有的自我监督的VFL作品中很少讨论。我们研究了FEDHSSL的保护机制。结果表明,我们的保护可以阻止最先进的标签推理攻击。
translated by 谷歌翻译
变压器在广泛的NLP任务中是最先进的,也已应用于许多现实世界产品。了解变压器模型预测的可靠性和确定性对于构建可信机器学习应用,例如医学诊断,这是至关重要的。虽然已经提出了许多最近的变压器延伸,但探讨了对变压器模型的不确定性估计的研究。在这项工作中,我们提出了一种新颖的方法来使变压器能够具有不确定性估计的能力,同时,同时保留原始预测性能。这是通过学习分别参加价值观和一组学习质心的分层随机自我关注来实现的。然后使用Gumbel-Softmax技巧用混合物形成新的注意头。理论上,我们展示了通过从牙龈分布中取样的自我注意逼近是上界的。我们在具有域中的两个文本分类任务和域名(OOD)数据集中的两个文本分类任务中凭证评估我们的模型。实验结果表明,我们的方法:(1)比较方法中最佳预测性能和不确定性权衡; (2)在ID数据集上展示非常竞争力(在大多数情况下,改进)预测性能; (3)与Monte Carlo辍学和集合方法进行了标准,在OOD数据集上的不确定性估算。
translated by 谷歌翻译
联合学习(FL)旨在通过使客户能够在不共享其私有数据的情况下协作构建机器学习模型来保护数据隐私。然而,最近的作品表明FL容易受到基于梯度的数据恢复攻击。保存技术的品种已经利用,以进一步提升FL的隐私。尽管如此,它们的计算或通信昂贵(例如,同态加密)或遭受精密损失(例如,差异隐私)。在这项工作中,我们提出了\ textsc {fedcg},一个新颖的\下划线{fed} erated学习方法,它利用\下划线{c} onditional \下划线{g}良好的对手网络来实现高级隐私保护,同时仍然保持竞争模型表现。更具体地说,\ textsc {fedcg}将每个客户端的本地网络分解为私有提取器和公共分类器,并保留本地提取器保护隐私。而不是暴露作为隐私泄漏的罪魁祸首的提取器,而是将客户的生成器与服务器共享,以聚合旨在增强客户端网络性能的公共知识。广泛的实验表明,与基线FL方法相比,\ TextSc {FEDCG}可以实现竞争模型性能,数值隐私分析表明\ TextSC {FEDCG}具有高级别的隐私保存能力。
translated by 谷歌翻译
医疗对话系统(MDSS)旨在协助医生和患者一系列专业医疗服务,即诊断,咨询和治疗。但是,一站式MDS仍然是未开发的,因为:(1)没有数据集如此大规模对话包含多种医疗服务和细粒度的医疗标签(即,意图,插槽,值); (2)没有模型已经根据统一框架中的多服务对话解决了MDS。在这项工作中,我们首先建立一个多域多次服务医学对话(M ^ 2-Meddialog)数据集,其中包含医生和患者的1,557种对话,涵盖276种疾病,2,468种医学实体和3种医疗服务专业。据我们所知,它是唯一包括多种医疗服务和细粒度医疗标签的医疗对话数据集。然后,我们将一站式MDS制定为序列到序列生成问题。我们分别统一MDS,具有因果语言建模和条件因果语言建模。具体而言,我们采用了几种预磨料模型(即,Bert-WWM,BERT-MED,GPT2和MT5)及其变体,以在M ^ 2-MedDialog数据集上获取基准。我们还提出了伪标签和自然扰动方法来扩展M2-MedDialog数据集,并增强最先进的预磨损模型。我们展示了到目前为止通过对M2-MEDDIALOG的大量实验来实现的结果。我们释放DataSet,代码以及评估脚本,以促进在这方面的未来研究。
translated by 谷歌翻译
In this paper, we propose a robust 3D detector, named Cross Modal Transformer (CMT), for end-to-end 3D multi-modal detection. Without explicit view transformation, CMT takes the image and point clouds tokens as inputs and directly outputs accurate 3D bounding boxes. The spatial alignment of multi-modal tokens is performed implicitly, by encoding the 3D points into multi-modal features. The core design of CMT is quite simple while its performance is impressive. CMT obtains 73.0% NDS on nuScenes benchmark. Moreover, CMT has a strong robustness even if the LiDAR is missing. Code will be released at https://github.com/junjie18/CMT.
translated by 谷歌翻译
Dataset distillation has emerged as a prominent technique to improve data efficiency when training machine learning models. It encapsulates the knowledge from a large dataset into a smaller synthetic dataset. A model trained on this smaller distilled dataset can attain comparable performance to a model trained on the original training dataset. However, the existing dataset distillation techniques mainly aim at achieving the best trade-off between resource usage efficiency and model utility. The security risks stemming from them have not been explored. This study performs the first backdoor attack against the models trained on the data distilled by dataset distillation models in the image domain. Concretely, we inject triggers into the synthetic data during the distillation procedure rather than during the model training stage, where all previous attacks are performed. We propose two types of backdoor attacks, namely NAIVEATTACK and DOORPING. NAIVEATTACK simply adds triggers to the raw data at the initial distillation phase, while DOORPING iteratively updates the triggers during the entire distillation procedure. We conduct extensive evaluations on multiple datasets, architectures, and dataset distillation techniques. Empirical evaluation shows that NAIVEATTACK achieves decent attack success rate (ASR) scores in some cases, while DOORPING reaches higher ASR scores (close to 1.0) in all cases. Furthermore, we conduct a comprehensive ablation study to analyze the factors that may affect the attack performance. Finally, we evaluate multiple defense mechanisms against our backdoor attacks and show that our attacks can practically circumvent these defense mechanisms.
translated by 谷歌翻译
Automatic music generation with artificial intelligence typically requires a large amount of data which is hard to obtain for many less common genres and musical instruments. To tackle this issue, we present ongoing work and preliminary findings on the possibility for deep models to transfer knowledge from language to music, by finetuning large language models pre-trained on a massive text corpus on only hundreds of MIDI files of drum performances. We show that by doing so, one of the largest, state-of-the-art models (GPT3) is capable of generating reasonable drum grooves, while models that are not pre-trained (Transformer) shows no such ability beyond naive repetition. Evaluating generated music is a challenging task, more so is evaluating drum grooves with little precedence in literature. Hence, we propose a tailored structural evaluation method and analyze drum grooves produced by GPT3 compared to those played by human professionals, exposing the strengths and weaknesses of such generation by language-to-music transfer. Our findings suggest that language-to-music transfer learning with large language models is viable and promising.
translated by 谷歌翻译